Instructor names and contact information
- Benjamin P. Berman
- Tiago C. Silva
Workshop Description
This workshop demonstrates how to perform ELMER analysis using matched RNA-seq and DNA methylation data.
You can find a detailed information about ELMER at http://bioconductor.org/packages/3.10/bioc/vignettes/ELMER/inst/doc/index.html.
Articles about ELMER:
- Tiago C Silva, Simon G Coetzee, Nicole Gull, Lijing Yao, Dennis J Hazelett, Houtan Noushmehr, De-Chen Lin, Benjamin P Berman, ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles, Bioinformatics, Volume 35, Issue 11, 1 June 2019, Pages 1974–1977, https://doi.org/10.1093/bioinformatics/bty902
- Yao, Lijing, et al. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome biology 16.1 (2015): 105. https://doi.org/10.1186/s13059-015-0668-3
- Yao, Lijing, Benjamin P. Berman, and Peggy J. Farnham. “Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes.” Critical reviews in biochemistry and molecular biology 50.6 (2015): 550-573. https://doi.org/10.3109/10409238.2015.1087961
Pre-requisites
- Basic knowledge of R syntax
- Familiarity with the SummarizedExperiment classes
- Familiarity with ’omics data types including DNA methylation and gene expression
Workshop Participation
Students will have a chance to run ELMER analysis on a provided MultiAssayExperiment object created from TCGA data from GDC data portal.
Goals and objectives
- gain familiarity ELMER input data, a MultiAssayExperiment object
- Execute ELMER analysis on real data and understand its meaning
R/Bioconductor packages used
Retrieving the data
The RNA-seq data, DNA methylation data and patients metadata (i.e age, gender) used in this workshop is structured as a MultiAssayExperiment. You can read more about a MultiAssayExperiment object at http://bioconductor.org/packages/MultiAssayExperiment/ and https://bioconductor.github.io/BiocWorkshops/workflow-for-multi-omics-analysis-with-multiassayexperiment.html.
Through the next section we will:
- load the data
- Verify the RNA-seq data
- Verify the DNA methylation data
- Verify the samples metadata
Download the data
The data is available in this google drive.
Loading the data
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] DNA methylation: RangedSummarizedExperiment with 148951 rows and 171 columns
## [2] Gene expression: RangedSummarizedExperiment with 56830 rows and 171 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
Verify the RNA-seq data
## ExperimentList class object of length 2:
## [1] DNA methylation: RangedSummarizedExperiment with 148951 rows and 171 columns
## [2] Gene expression: RangedSummarizedExperiment with 56830 rows and 171 columns
## [1] 56830
## [1] 171
## GRanges object with 56830 ranges and 3 metadata columns:
## seqnames ranges strand | ensembl_gene_id
## <Rle> <IRanges> <Rle> | <character>
## ENSG00000000003 chrX 100627109-100639991 - | ENSG00000000003
## ENSG00000000005 chrX 100584802-100599885 + | ENSG00000000005
## ENSG00000000419 chr20 50934867-50958555 - | ENSG00000000419
## ENSG00000000457 chr1 169849631-169894267 - | ENSG00000000457
## ENSG00000000460 chr1 169662007-169854080 + | ENSG00000000460
## ... ... ... ... . ...
## ENSG00000281904 chr2 90365737-90367699 + | ENSG00000281904
## ENSG00000281909 chr15 22480439-22484840 - | ENSG00000281909
## ENSG00000281910 chr16 58559796-58559931 - | ENSG00000281910
## ENSG00000281912 chr1 45303910-45305619 + | ENSG00000281912
## ENSG00000281920 chr2 65623272-65628424 + | ENSG00000281920
## external_gene_name original_ensembl_gene_id
## <character> <character>
## ENSG00000000003 TSPAN6 ENSG00000000003.13
## ENSG00000000005 TNMD ENSG00000000005.5
## ENSG00000000419 DPM1 ENSG00000000419.11
## ENSG00000000457 SCYL3 ENSG00000000457.12
## ENSG00000000460 C1orf112 ENSG00000000460.15
## ... ... ...
## ENSG00000281904 AC233263.6 ENSG00000281904.1
## ENSG00000281909 HERC2P7 ENSG00000281909.1
## ENSG00000281910 SNORA50A ENSG00000281910.1
## ENSG00000281912 LINC01144 ENSG00000281912.1
## ENSG00000281920 AC007389.5 ENSG00000281920.1
## -------
## seqinfo: 24 sequences from an unspecified genome; no seqlengths
## TCGA-2H-A9GF-01A-11R-A37I-31 TCGA-2H-A9GG-01A-11R-A37I-31
## ENSG00000000003 17.63165 16.758267
## ENSG00000000005 0.00000 7.721927
## ENSG00000000419 19.33125 19.188030
## ENSG00000000457 15.87565 16.251205
## TCGA-2H-A9GH-01A-11R-A37I-31 TCGA-2H-A9GI-01A-11R-A37I-31
## ENSG00000000003 17.624508 17.89604
## ENSG00000000005 9.236257 0.00000
## ENSG00000000419 19.765411 20.90188
## ENSG00000000457 15.745243 15.92405
Verify the DNA methylation data
## ExperimentList class object of length 2:
## [1] DNA methylation: RangedSummarizedExperiment with 148951 rows and 171 columns
## [2] Gene expression: RangedSummarizedExperiment with 56830 rows and 171 columns
## [1] 148951
## [1] 171
## GRanges object with 148951 ranges and 4 metadata columns:
## seqnames ranges strand | address_A address_B
## <Rle> <IRanges> <Rle> | <integer> <integer>
## cg08258224 chr1 864703-864704 - | 23712478 <NA>
## cg13938959 chr1 898803-898804 + | 38601418 <NA>
## cg12445832 chr1 898915-898916 - | 70726486 <NA>
## cg23999112 chr1 898976-898977 - | 49747457 <NA>
## cg11527153 chr1 902156-902157 + | 47662324 <NA>
## ... ... ... ... . ... ...
## cg16428758 chrX 155539968-155539969 - | 28623392 <NA>
## cg05682970 chrX 155609878-155609879 - | 34723345 <NA>
## cg07211220 chrX 155616254-155616255 + | 33714463 <NA>
## cg25059696 chrY 7563149-7563150 + | 61613414 <NA>
## cg13851368 chrY 11954167-11954168 + | 36601306 32634316
## channel designType
## <character> <character>
## cg08258224 Both II
## cg13938959 Both II
## cg12445832 Both II
## cg23999112 Both II
## cg11527153 Both II
## ... ... ...
## cg16428758 Both II
## cg05682970 Both II
## cg07211220 Both II
## cg25059696 Both II
## cg13851368 Grn I
## -------
## seqinfo: 26 sequences from an unspecified genome; no seqlengths
## TCGA-2H-A9GF-01A-11D-A37D-05 TCGA-2H-A9GG-01A-11D-A37D-05
## cg08258224 0.9068315 0.8888116
## cg13938959 0.2791042 0.2178040
## cg12445832 0.1900486 0.1202698
## cg23999112 0.3198171 0.1969021
## TCGA-2H-A9GH-01A-11D-A37D-05 TCGA-2H-A9GI-01A-11D-A37D-05
## cg08258224 0.8247016 0.8174693
## cg13938959 0.3241953 0.3408468
## cg12445832 0.2290422 0.1741125
## cg23999112 0.4268968 0.4687823
Verify the samples metadata
# metadata can be accessed using the function colData
# we will check the first 4 samples and their 4 columns
colData(mae)[1:4,1:4]## DataFrame with 4 rows and 4 columns
## sample patient shortLetterCode
## <character> <character> <character>
## TCGA-2H-A9GF-01A TCGA-2H-A9GF-01A TCGA-2H-A9GF TP
## TCGA-2H-A9GG-01A TCGA-2H-A9GG-01A TCGA-2H-A9GG TP
## TCGA-2H-A9GH-01A TCGA-2H-A9GH-01A TCGA-2H-A9GH TP
## TCGA-2H-A9GI-01A TCGA-2H-A9GI-01A TCGA-2H-A9GI TP
## definition
## <character>
## TCGA-2H-A9GF-01A Primary solid Tumor
## TCGA-2H-A9GG-01A Primary solid Tumor
## TCGA-2H-A9GH-01A Primary solid Tumor
## TCGA-2H-A9GI-01A Primary solid Tumor
## [1] "Adenocarcinoma, NOS" "Adenocarcinoma, NOS" "Adenocarcinoma, NOS"
## [4] "Adenocarcinoma, NOS"
##
## Metastatic Primary solid Tumor Solid Tissue Normal
## 1 161 9
# or summarize the numbers of samples using two features
table(mae$primary_diagnosis,mae$definition)##
## Metastatic
## Adenocarcinoma, NOS 0
## Basaloid squamous cell carcinoma 0
## Mucinous adenocarcinoma 0
## Squamous cell carcinoma, keratinizing, NOS 0
## Squamous cell carcinoma, NOS 1
## Tubular adenocarcinoma 0
##
## Primary solid Tumor
## Adenocarcinoma, NOS 78
## Basaloid squamous cell carcinoma 1
## Mucinous adenocarcinoma 1
## Squamous cell carcinoma, keratinizing, NOS 4
## Squamous cell carcinoma, NOS 76
## Tubular adenocarcinoma 1
##
## Solid Tissue Normal
## Adenocarcinoma, NOS 7
## Basaloid squamous cell carcinoma 0
## Mucinous adenocarcinoma 1
## Squamous cell carcinoma, keratinizing, NOS 0
## Squamous cell carcinoma, NOS 1
## Tubular adenocarcinoma 0
Selecting the samples
Since we will use only Primary solid Tumor samples we will remove the Metastatic and Solid Tissue Normal samples from our object
Performing ELMER analysis
We will first define some parameters used in the majority of ELMER function
# which is the column of the samples metadata defining our groups
group.col <- "primary_diagnosis"
# which are the grops withing the group.col that we want to compare
group1 <- "Adenocarcinoma, NOS"
group2 <- "Squamous cell carcinoma, NOS"
# Identify probes hypo in group1 or hypo in group2 ?
direction <- "hypo" # hypo in group 1
cores <- 2
mode <- "supervised"
# Where do we want to save the results ?
dir.out <- paste0("analysis/",group.col,"-",
gsub("[[:punct:]]| ", "_", group1),
"_vs_",
gsub("[[:punct:]]| ", "_", group2),
"_", mode,
"/", direction)
dir.create(dir.out, showWarnings = FALSE,recursive = T)Identifying hypo methylated distal probes between ESAD samples compared to ESCC samples
# time expected with one core: 4 minuntes
diff.probes <- get.diff.meth(data = mae,
group.col = group.col,
group1 = group1,
group2 = group2,
diff.dir = direction, # Get probes hypometh. in group 1
cores = cores,
mode = mode,
pvalue = 0.01,
sig.dif = 0.3,
dir.out = dir.out,
save = TRUE)## Supervised mode will use all samples from boths groups. Setting argument minSubgroupFrac to 1
## ELMER will search for probes hypomethylated in group Adenocarcinoma, NOS (n:78) compared to Squamous cell carcinoma, NOS (n:76)
## ooo Arguments ooo
## o Number of probes: 148951
## o Beta value difference cut-off: 0.3
## o FDR cut-off: 0.01
## o Mode: supervised
## o % of samples per group in each comparison: 1
## o Min number of samples per group in each comparison: 5
## o Nb of samples group1 in each comparison: 78
## o Nb of samples group2 in each comparison: 76
## Output direction: analysis/primary_diagnosis-Adenocarcinoma__NOS_vs_Squamous_cell_carcinoma__NOS_supervised/hypo
## ooooooooooooooooo
## Progress disabled when using parallel plyr
## Saving results
## Number of relevant probes found: 1524
## probe pvalue
## cg19860160 cg19860160 3.178629e-41
## cg02838683 cg02838683 7.322499e-46
## cg22512779 cg22512779 1.305816e-45
## cg17094363 cg17094363 4.500204e-35
## cg20429981 cg20429981 1.834374e-22
## cg01189072 cg01189072 9.596881e-18
## Adenocarcinoma..NOS_Minus_Squamous.cell.carcinoma..NOS
## cg19860160 -0.4256266
## cg02838683 -0.3138876
## cg22512779 -0.3538253
## cg17094363 -0.4751664
## cg20429981 -0.3227621
## cg01189072 -0.3771010
## adjust.p
## cg19860160 4.782424e-38
## cg02838683 2.423763e-42
## cg22512779 4.225151e-42
## cg17094363 2.376985e-32
## cg20429981 1.790510e-20
## cg01189072 5.028016e-16
Correlate RNA-seq expression and DNA methylation
Now that we have our hypomethylated probes we will look for the 20 nearest genes
to each probes to identify if any of these genes could being affected by the hypomethylation in the distal regulatory region.
# time expected less than one minute
nearGenes <- GetNearGenes(data = mae,
probes = diff.probes$probe,
numFlankingGenes = 20)## Searching for the 20 near genes
## Identifying gene position for each probe
## # A tibble: 6 x 5
## # Groups: ID [1]
## ID GeneID Symbol Distance Side
## <chr> <chr> <chr> <dbl> <chr>
## 1 cg00009553 ENSG00000261590 AC018554.2 -1462705 L10
## 2 cg00009553 ENSG00000278499 AC018554.3 -1460699 L9
## 3 cg00009553 ENSG00000261436 AC018554.1 -1377934 L8
## 4 cg00009553 ENSG00000261310 AC009081.1 -1296929 L7
## 5 cg00009553 ENSG00000259844 GNPATP -1163530 L6
## 6 cg00009553 ENSG00000261178 AC009169.1 -891638 L5
Now that we have our hypomethylated probes and the 20 nearest genes we want to identify if any of these genes could being affected by the hypomethylatio in the distal regulatory region.
# time expected one minute
pair <- get.pair(data = mae,
nearGenes = nearGenes,
group.col = group.col,
group1 = group1,
group2 = group2,
mode = mode,
diff.dir = direction,
raw.pvalue = 0.001,
Pe = 0.001,
filter.probes = TRUE,
filter.percentage = 0.05,
filter.portion = 0.3,
dir.out = dir.out,
cores = cores,
label = direction)## Using pre-defined groups. U (unmethylated): Adenocarcinoma, NOS, M (methylated): Squamous cell carcinoma, NOS
## -------------------
## * Filtering probes
## -------------------
## For more information see function preAssociationProbeFiltering
## Making sure we have at least 5% of beta values lesser than 0.3 and 5% of beta values greater 0.3.
## Removing 100701 probes out of 148951
## Calculating Pp (probe - gene) for all nearby genes
## Progress disabled when using parallel plyr
## File created: analysis/primary_diagnosis-Adenocarcinoma__NOS_vs_Squamous_cell_carcinoma__NOS_supervised/hypo/getPair.hypo.all.pairs.statistic.csv
## Probe GeneID Symbol Distance
## cg13100118.ENSG00000178828 cg13100118 ENSG00000178828 RNF186 345152
## cg19533443.ENSG00000173702 cg19533443 ENSG00000173702 MUC13 -38437
## cg14981432.ENSG00000124920 cg14981432 ENSG00000124920 MYRF 0
## cg26545245.ENSG00000166866 cg26545245 ENSG00000166866 MYO1A 40016
## cg17277896.ENSG00000143375 cg17277896 ENSG00000143375 CGN 37443
## cg21157873.ENSG00000143375 cg21157873 ENSG00000143375 CGN 0
## Sides Raw.p FDR
## cg13100118.ENSG00000178828 R9 4.334152e-27 3.345954e-24
## cg19533443.ENSG00000173702 L2 6.023782e-27 3.345954e-24
## cg14981432.ENSG00000124920 L1 7.607601e-27 3.345954e-24
## cg26545245.ENSG00000166866 R4 7.909052e-27 3.345954e-24
## cg17277896.ENSG00000143375 R1 8.886298e-27 3.345954e-24
## cg21157873.ENSG00000143375 L1 8.886298e-27 3.345954e-24
We can plot the correlation heatmap of DNA methylation and gene expression with the heatmapPairs function. We will plot only the first 100 pairs.
heatmapPairs(data = mae,
group.col = group.col,
group1 = group1,
group2 = group2,
subset = TRUE,
pairs = pair[1:100,],
filename = NULL)## Subsetting
## Ordering groups
Identifying TF enriched motifs on the hypomethylated regions
We have hypomethylated regions that has a distal gene upregulated. We want to identify which is the MR TF binding to those regions and regulating the distal gene. First we identify which are the enriched motifs for those hypomethylated regions using the function get.enriched.motif. This will output the enriched motifs and the respective hypomethylated probes with the motif signature around it (since a DNA methylation probes is 1bp, we extended the search window to \(\pm\) 250 bp).
# time expected less than one minute
enriched.motif <- get.enriched.motif(data = mae,
probes = unique(pair$Probe),
dir.out = dir.out,
label = direction,
min.incidence = 10,
lower.OR = 1.5)## [1] "GATA4_HUMAN.H11MO.0.A" "GATA6_HUMAN.H11MO.0.A" "HNF4A_HUMAN.H11MO.0.A"
## [4] "GATA2_HUMAN.H11MO.1.A"
# and the paired hypomethylated probes with the motif signature
enriched.motif$GATA4_HUMAN.H11MO.0.A## [1] "cg22050733" "cg04275506" "cg19860160" "cg22512779" "cg19626253"
## [6] "cg03601886" "cg13784017" "cg11959316" "cg07191152" "cg25098793"
## [11] "cg05625834" "cg17277896" "cg26787863" "cg00075975" "cg07005353"
## [16] "cg00577847" "cg07117596" "cg14520947" "cg23486701" "cg22107147"
## [21] "cg20043649" "cg08818866" "cg08847775" "cg12077754" "cg27239243"
## [26] "cg10560561" "cg05977696" "cg23367358" "cg10018933" "cg21926118"
## [31] "cg07613752" "cg01450522" "cg24590723" "cg03865925" "cg18342703"
## [36] "cg21425296" "cg00584840" "cg23174148" "cg22539744" "cg01377807"
## [41] "cg04319611" "cg11578055" "cg15634747" "cg23971987" "cg01923775"
## [46] "cg05626226" "cg03464224" "cg00796302" "cg26728709" "cg08988420"
## [51] "cg08830157" "cg18430895" "cg08609445" "cg16016281" "cg11821200"
## [56] "cg24606701" "cg27351783" "cg21566177" "cg03172688" "cg10505610"
## [61] "cg11584098" "cg19692192" "cg05165087" "cg07186962" "cg08015382"
## [66] "cg27565517" "cg15094236" "cg12663302" "cg23381884" "cg14493742"
## [71] "cg09305161" "cg10846969" "cg23226631" "cg10784258" "cg22443940"
## [76] "cg19929194" "cg12964881" "cg24645679" "cg00084338" "cg17287122"
## [81] "cg27246129" "cg09022230" "cg00772192" "cg02498579" "cg08828787"
## [86] "cg14033170" "cg18100008" "cg09422239" "cg15059474" "cg24300607"
## [91] "cg19935531" "cg14199261" "cg05921889" "cg05744395" "cg18991321"
## [96] "cg00987534" "cg02901002" "cg12018521" "cg01824603" "cg10087556"
## [101] "cg14533298" "cg17291136" "cg04237288" "cg18336854" "cg12726213"
## [106] "cg24027477" "cg15235922" "cg07273304" "cg26895804" "cg10131026"
## [111] "cg14325112" "cg03489427" "cg10063407" "cg01364755" "cg14509153"
## [116] "cg10476206" "cg17568611" "cg23681599" "cg10494981" "cg25227803"
## [121] "cg27065896" "cg11688410" "cg17425818" "cg04337594" "cg18203466"
## [126] "cg10955669" "cg19754622" "cg13608733" "cg26725502" "cg16311302"
## [131] "cg15722533" "cg20482760" "cg23091777" "cg03078767" "cg02856190"
## [136] "cg25015613" "cg15110243" "cg18246298" "cg02159731" "cg06978117"
## [141] "cg23327896" "cg16182447" "cg26540315" "cg18248586" "cg03554573"
## [146] "cg21127079" "cg21672401" "cg21738443" "cg25757470" "cg17046586"
## [151] "cg19091779" "cg01579458" "cg10621809" "cg17066349" "cg27614751"
## [156] "cg26545245" "cg04874286" "cg11658060" "cg13712197" "cg17843665"
## [161] "cg20360395" "cg16378261" "cg12194594" "cg16536610" "cg12128696"
## [166] "cg13995599" "cg02909176" "cg16217297" "cg07800658" "cg15935770"
## [171] "cg23639055" "cg14850604" "cg21207694" "cg19708418" "cg13594138"
## [176] "cg21523688" "cg04094640" "cg09113939" "cg07092029" "cg07098502"
## [181] "cg10020890" "cg00599850" "cg13688786" "cg04726374" "cg04136369"
## [186] "cg08659204" "cg03828329" "cg04770088" "cg04603419" "cg17415265"
## [191] "cg00777445" "cg19078037" "cg00423871" "cg09926389" "cg18788725"
## [196] "cg07604565" "cg26162564" "cg27403406" "cg21211039" "cg01327767"
## [201] "cg13758960" "cg05321038" "cg00389552"
<<<<<<< HEAD
motif.result.file <- file.path(dir.out,"getMotif.hypo.motif.enrichment.csv")
=======
motif.result.file <- "analysis/primary_diagnosis-Adenocarcinoma..NOS_vs_Squamous.cell.carcinoma..NOS_supervised/hypo//getMotif.hypo.motif.enrichment.csv"
>>>>>>> 736b28b463b1fdd25bcd2659c86c33517475f6c4
motif.enrichment.plot(motif.enrichment = motif.result.file,
significant = list(lowerOR = 2.0),
label = "hypo",
summary = TRUE,
save = FALSE)
<<<<<<< HEAD

=======

>>>>>>> 736b28b463b1fdd25bcd2659c86c33517475f6c4
## TableGrob (2 x 2) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
Inferring MR TF binding to hypomethylated regions
Now that we have our enriched motifs, we have the following question: - Which is the candidate master regulator Transcription factor binding to those regions ?
It is important to highlight that since TF within the same family and subfamily have very close motifs, they will likely be indentified to bind the same regions. You can check that by verifying the intersection of probes from our enrichment analysis results.
## [1] 203
## [1] 189
## [1] 147
So more than 70% of probes are enriched for both GATA4 and GATA6. How could we infer which of those MR might be the one with higher impact ?
For that ELMER correlates the TF expression vs the average mean methylation of the hypomethylated probes with a motif signature and rank them. We expect that a TF with higher expression and lower methylation in the binding regions will be a better MR TF candidate than TF that are for example equally expressed on both groups.
# time expected ~5 minutes
TF <- get.TFs(data = mae,
group.col = group.col,
mode = mode,
group1 = group1,
group2 = group2,
diff.dir = direction,
enriched.motif = enriched.motif,
dir.out = dir.out,
label = direction,
cores = cores)## Using pre-defined groups. U (unmethylated): Adenocarcinoma, NOS, M (methylated): Squamous cell carcinoma, NOS
## -------------------------------------------------------------------------------------------------------------------
## ** Downloading TF list from Lambert, Samuel A., et al. The human transcription factors. Cell 172.4 (2018): 650-665.
## -------------------------------------------------------------------------------------------------------------------
## Accessing TF families from TFClass database to indentify known potential TF
## Retrieving TFClass family classification from ELMER.data.
## Retrieving TFClass subfamily classification from ELMER.data.
## Calculating the average methylation at all motif-adjacent probes
## Progress disabled when using parallel plyr
## Performing Mann-Whitney U test
## Progress disabled when using parallel plyr
## Finding potential TF and known potential TF
## Progress disabled when using parallel plyr
## motif top.potential.TF.family
## GATA4_HUMAN.H11MO.0.A GATA4_HUMAN.H11MO.0.A GATA6
## top.potential.TF.subfamily potential.TF.family
## GATA4_HUMAN.H11MO.0.A GATA6 GATA6;GATA4
## potential.TF.subfamily
## GATA4_HUMAN.H11MO.0.A GATA6;GATA4
Visualizing the results with TF ranking plot
ELMER provides an easy way to visualize the rank of all human TF evaluated with the function TF.rank.plot.
Summarizing several ELMER analysis
MR TF heatmap
When you perform several ELMER analysis with different arguments, you might want to summarize and visualize those several results. For that, we have the ELMER:::get.tabs function , which accepts as input the directories of the results and if you want to select the MR TF within the family of subfamily classification.
In the Data/analysis_april_2019 folder we provided 4 ELMER analysis. The first step is to retrieve the directory names with the results. Also, it is a golden rule when running several ELMER runs to add a understable name to the dir.out directories, such that you easily know which analysis were ran, with which arguments. In the example below, you can identify the tumor type, type of ELMER run (supervised/unsupervised), groups compared and direction of the analysis (hypo or hyper in group 1).
In the directory analysis_april_2019 we provide 4 results from different ELMER runs.
analsysis.dir <- unique(dirname(dir("Data/analysis_april_2019/",recursive = T,full.names = T)))
analsysis.dir## [1] "Data/analysis_april_2019//COAD-READ/unsupervised/definition-Primary.solid.Tumor_vs_Solid.Tissue.Normal/hypo"
## [2] "Data/analysis_april_2019//ESCA/unsupervised/group-Adenocarcinoma.Primary.solid.Tumor_vs_Adenocarcinoma.Solid.Tissue.Normal/hypo"
## [3] "Data/analysis_april_2019//ESCA/unsupervised/group-Adenocarcinoma.Primary.solid.Tumor_vs_Squamous.cell.carcinoma.Primary.solid.Tumor/hypo"
## [4] "Data/analysis_april_2019//PAAD/unsupervised/definition-Primary.solid.Tumor_vs_Solid.Tissue.Normal/hypo"
With those directory names we will use the function ELMER:::get.tabs to summarize the results.
analsysis.dir <- unique(dirname(dir("analysis_april_2019/",recursive = T,full.names = T)))
analsysis.dir## [1] "analysis_april_2019//COAD-READ/unsupervised/definition-Primary.solid.Tumor_vs_Solid.Tissue.Normal/hypo"
## [2] "analysis_april_2019//ESCA/unsupervised/group-Adenocarcinoma.Primary.solid.Tumor_vs_Adenocarcinoma.Solid.Tissue.Normal/hypo"
## [3] "analysis_april_2019//ESCA/unsupervised/group-Adenocarcinoma.Primary.solid.Tumor_vs_Squamous.cell.carcinoma.Primary.solid.Tumor/hypo"
## [4] "analysis_april_2019//PAAD/unsupervised/definition-Primary.solid.Tumor_vs_Solid.Tissue.Normal/hypo"
The funtion get.tabs will summarize the results in several tables
- tab: a binary matrix saying if the TF was found or not in the analysis
- tab.pval: a matrix with the MR TF p-value found in the analysis
- tab.or: a matrix with the highest motif OR for the MR TF found in the analysis
- tf.or.table: a matrix with 4 collums: MR TF, motif, FDR and analysis it was identified.
## o Creating TF binary matrix
## Joining, by = "TF"
## Joining, by = "TF"
## Joining, by = "TF"
## o Creating TF FDR matrix
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## o Creating TF OR matrix
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## Joining, by = "motif"
## [1] "tab" "tab.pval" "tab.or" "tf.or.table"
# To be more readable we will change the path names to the analysis name
analysis.names <- c( "COAD-READ (vs Normal)",
"EAC (vs Normal)",
"EAC (vs ESCC)",
"PAAD (vs Normal)")
<<<<<<< HEAD
for (i in 1:3) colnames(tabs[[i]]) <- analysis.namesThis function creates 4 matrices using some of the files in each analysis directory: 1. a binary matrix saying if the TF was found or not in the analysis: this function parses each getTF.hypo.significant.TFs.with.motif.summary.csv (create after you run the get.TFs function) file that has the MR TF for each enriched motif.
# 1. a binary matrix saying if the TF was found or not in the analysis
tab.binary <- tabs$tab
head(tab.binary)## COAD-READ (vs Normal) EAC (vs Normal) EAC (vs ESCC)
## ARID3A 1 0 0
## FOXQ1 1 0 0
## HNF1A 1 1 1
## PDX1 1 0 0
## POU5F1B 1 0 0
## SOX9 1 0 0
## PAAD (vs Normal)
## ARID3A 0
## FOXQ1 0
## HNF1A 1
## PDX1 0
## POU5F1B 0
## SOX9 1
<<<<<<< HEAD
- a matrix with the MR TF p-value found in the analysis: this uses
getTF.hypo.significant.TFs.with.motif.summary.csvto identify the MR TF andgetTF.hypo.TFs.with.motif.pvalue.rda, which is a matrix with TF as columns and enriched motifs as rows and the contains the p-value of the wilcoxon test of the TF expression for the Unmethylated vs Methylated groups.
## COAD-READ (vs Normal) EAC (vs Normal) EAC (vs ESCC)
## ARID3A 8.397450e-08 NA NA
## FOXQ1 3.774568e-08 NA NA
## HNF1A 5.845199e-10 0.007700548 1.689635e-09
## PDX1 3.243946e-09 NA NA
## POU5F1B 1.259413e-10 NA NA
## SOX9 8.410768e-08 NA NA
## PAAD (vs Normal)
## ARID3A NA
## FOXQ1 NA
## HNF1A 1.777426e-07
## PDX1 NA
## POU5F1B NA
## SOX9 2.017003e-07
<<<<<<< HEAD
- a matrix with the highest motif OR for the MR TF found in the analysis: this function uses the file
getTF.hypo.significant.TFs.with.motif.summary.csvto get the MR TF and then usesgetMotif.hypo.motif.enrichment.csvto get the motifs OR to which the MR TF is known to bind and returns the highest OR.
# 3. a matrix with the highest motif OR for the MR TF found in the analysis
tab.or <- tabs$tab.or
head(tab.or)# 3. a matrix with the highest motif OR for the MR TF found in the analysis
tab.or <- tabs$tab.or
head(tab.or)## COAD-READ (vs Normal) EAC (vs Normal) EAC (vs ESCC)
## ARID3A 1.348007 NA NA
## FOXQ1 1.362852 NA NA
## HNF1A 1.471907 1.501869 2.190133
## PDX1 1.730954 NA NA
## POU5F1B 1.786492 NA NA
## SOX9 1.689930 NA NA
## PAAD (vs Normal)
## ARID3A NA
## FOXQ1 NA
## HNF1A 1.960243
## PDX1 NA
## POU5F1B NA
## SOX9 1.684580
<<<<<<< HEAD
- a matrix with 4 collums: MR TF, motif, FDR and analysis it was identified: This function uses the same files as the previous matrices, but returns the results in a different format.
# 4. a matrix with 4 collums: MR TF, motif, FDR and analysis it was identified.
tf.or.table <- tabs$tf.or.table
# Rename analysis
tf.or.table$analysis <- gsub("ESCA","EAC",paste0(basename(dirname(dirname(dirname(tf.or.table$analysis)))),
" (vs ",
ifelse(grepl("Normal",tf.or.table$analysis),"Normal","ESCC"),")"))
head(tf.or.table)# 4. a matrix with 4 collums: MR TF, motif, FDR and analysis it was identified.
tf.or.table <- tabs$tf.or.table
# Rename analysis
tf.or.table$analysis <- gsub("ESCA","EAC",paste0(basename(dirname(dirname(dirname(tf.or.table$analysis)))),
" (vs ",
ifelse(grepl("Normal",tf.or.table$analysis),"Normal","ESCC"),")"))
head(tf.or.table)## TF motif FDR analysis
## 7 ZNF263 ZIM3_HUMAN.H11MO.0.C 7.120537e-11 COAD-READ (vs Normal)
## 5 POU5F1B P5F1B_HUMAN.H11MO.0.D 1.259413e-10 COAD-READ (vs Normal)
## 8 ZSCAN16 ZNF85_HUMAN.H11MO.0.C 3.818943e-10 COAD-READ (vs Normal)
## 3 HNF1A HNF1B_HUMAN.H11MO.1.A 5.845199e-10 COAD-READ (vs Normal)
## 17 HNF1A HNF1B_HUMAN.H11MO.1.A 1.689635e-09 EAC (vs ESCC)
## 21 FOXA3 FOXA1_HUMAN.H11MO.0.A 1.806201e-09 EAC (vs ESCC)
<<<<<<< HEAD
Creating Summary plots
With those summarized results we can create two summary plots:
- A heatmap with the MR TFs identified in each analysis.
- A scatter plot of the TF expression vs the avg DNA methylation of the TFBS.
MR TF heatmap
The code below creates the heatmap using complexHeatmap package for the binary MR TF matrix and the p-value one.
library(ComplexHeatmap)
library(vegan)
col <- ifelse(classification == "family","top.potential.TF.family", "top.potential.TF.subfamily")
analysis.colors <- c(
"COAD-READ (vs Normal)" = '#ff964f',
"EAC (vs Normal)" = '#94e894',
"EAC (vs ESCC)" = '#eaadb9',
"PAAD (vs Normal)" = '#9ab9f9'
)
# get top TFs in each anaylsis
labels <- ELMER:::get.top.tf.by.pval(tab.pval,top = nrow(tab.pval))
hb = HeatmapAnnotation(df = data.frame("Mode" = rep("unsupervised",4),
"Analysis" = analysis.names),
col = list("Analysis" = analysis.colors,
"Mode" = c("unsupervised" = "#ebd540",
"supervised" = "#718da5")),
show_annotation_name = FALSE,
show_legend = T,
annotation_name_side = "left",
annotation_name_gp = gpar(fontsize = 12))
# Binary heatmap
ht_list_binary <- Heatmap(as.matrix(tab.binary),
name = "Binary heatmap" ,
col = c("1" = "black", "0" = "white"),
column_names_gp = gpar(fontsize = 8),
show_column_names = FALSE,
use_raster = TRUE,
na_col = "white",
raster_device = c("png"),
raster_quality = 2,
show_row_names = F,
show_heatmap_legend = TRUE,
cluster_columns = FALSE,
cluster_rows = TRUE,
show_column_dend = FALSE,
show_row_dend = FALSE,
clustering_distance_rows = function(x) {
vegan::vegdist(tab.binary,method = "jaccard",binary = TRUE)
},
clustering_method_rows = "average",
top_annotation = hb,
width = unit(5, "cm"),
row_names_gp = gpar(fontsize = 1),
column_title = paste0("Binrary MR TF Heatmap"),
column_title_gp = gpar(fontsize = 10),
row_title_gp = gpar(fontsize = 16))
ht_list_binary <- ht_list_binary +
rowAnnotation(link = anno_mark(at = match(labels,rownames(tab.pval)),
labels = labels,labels_gp = gpar(fontsize = 10)),
width = unit(1, "cm") + max_text_width(labels)
)
draw(ht_list_binary,
newpage = FALSE,
column_title_gp = gpar(fontsize = 12, fontface = "bold"),
heatmap_legend_side = "left",
show_heatmap_legend = TRUE,
annotation_legend_side = "right")# P-value heatmap
ht_list <- Heatmap(scale(as.matrix(-log10(tab.pval))),
name = "column z-score FDR",
col = colorRampPalette(c('#3361A5', '#248AF3', '#14B3FF',
'#88CEEF', '#C1D5DC', '#EAD397',
'#FDB31A','#E42A2A', '#A31D1D'))(100),
column_names_gp = gpar(fontsize = 8),
show_column_names = FALSE,
use_raster = TRUE,
na_col = "white",
raster_device = c("png"),
raster_quality = 2,
show_row_names = F,
show_heatmap_legend = TRUE,
cluster_columns = FALSE,
cluster_rows = FALSE,
row_order = row_order(ht_list_binary),
top_annotation = hb,
width = unit(5, "cm"),
row_names_gp = gpar(fontsize = 1),
column_title = paste0("MR TF Heatmap"),
column_title_gp = gpar(fontsize = 10),
row_title_gp = gpar(fontsize = 16))
ht_list <- ht_list +
rowAnnotation(link = anno_mark(at = match(labels,rownames(tab.pval)),
labels = labels,labels_gp = gpar(fontsize = 10)),
width = unit(1, "cm") + max_text_width(labels)
)
draw(ht_list,
newpage = TRUE,
column_title_gp = gpar(fontsize = 12, fontface = "bold"),
heatmap_legend_side = "left",
show_heatmap_legend = TRUE,
annotation_legend_side = "right")MR TF expression vs DNA methylation at binding motifs
The main difference from the last plot is the requirement of the MAE object with the DNA methylation and gene expression data. For a given MR TF, we need to indentify the top enriched motif it binds and then for each enriched motifs its distal paired differently methylated probes using the getMotif.hypo.enriched.motifs.rda object within the results folder.
Also, since we only provide one object used for two analysis (“EAC (vs Normal)” and “EAC (vs ESCC)” ) we will only plot the MR TF expression vs DNA methylation at binding motifs for them.
analsysis.dir.esca <- unique(dirname(dir("Data/analysis_april_2019/ESCA/",recursive = T,full.names = T)))
mae <- readRDS("Data/TCGA_ESCA_MAE_distal_regions.rds")
tab.pval.esca <- tab.pval[,2:3]
suppressMessages({
library(ELMER)
library(grid)
library(ggplot2)
library(ggpubr)
library(dplyr)
})
analysis.colors <- c(
"COAD-READ (vs Normal)" = '#ff964f',
"EAC (vs Normal)" = '#94e894',
"EAC (vs ESCC)" = '#eaadb9',
"PAAD (vs Normal)" = '#9ab9f9'
)
# Top has the number of MRTF to plot
for (top in c(5)) {
scatter.list <- list()
scatter.grid <- list()
scatter.labels <- c()
for (path in analsysis.dir.esca) {
# Load of object with enriched motifs and the probes mapped to it
load(dir(path = path,
pattern = "getMotif.hypo.enriched.motifs.rda",
recursive = T,full.names = T)
)
i <- which(path == analsysis.dir.esca)
tumor <- basename(dirname(dirname(dirname(path))))
mae.plot <- mae
# Change the data to plot based on the analysis performed
if (grepl("Squamous",path)) {
category <- "primary_diagnosis"
# keep only samples used in the analysis
mae.plot <- mae.plot[,mae.plot$primary_diagnosis %in% c("Adenocarcinoma, NOS",
"Squamous cell carcinoma, NOS") &
mae.plot$definition %in% c("Primary solid Tumor")]
} else {
category <- "definition"
# keep only samples used in the analysis
mae.plot <- mae.plot[,mae.plot$primary_diagnosis %in% c("Adenocarcinoma, NOS") &
mae.plot$definition %in% c("Primary solid Tumor","Solid Tissue Normal")]
}
# get top Tfs
topTFs <- rownames(tab.pval)[head(sort(DelayedArray::rowMins(tab.pval.esca[,i,drop = F],na.rm = T),
index.return = TRUE, decreasing = F)$ix, n = top)]
topTFs <- topTFs[1:top]
for (j in topTFs) {
TF <- j
scatter.labels <- c(scatter.labels,colnames(tab.pval.esca)[i])
motif <- tf.or.table[tf.or.table$TF == j &
tf.or.table$analysis == colnames(tab.pval.esca)[i],"motif"][1]
# Change the dots color based on the analysis performed
if (grepl("Squamous",path)) color.value <- c("#FF0000","#0000FF")
if (grepl("Adenocarcinoma.Solid.Tissue.Normal",path)) color.value <- c("#FF0000","#176d55")
if (is.na(motif)) next
# Plot TF expression vs avg DNA methylation of the TFBS
suppressMessages({
scatter <- scatter.plot(data = mae.plot,
correlation = TRUE,
byTF = list(TF = TF,
probe = enriched.motif[[motif]]),
category = category,
ylim = c(0,25),
dots.size = 0.8,
color.value = color.value,
save = FALSE,
lm_line = TRUE) +
ylab(bquote(.(TF) ~ log[2](RSEM + 1))) +
xlab(paste0("Avg. DNA met. at \n", length(enriched.motif[[motif]]),
" CpGs w/ \n enriched motif ",
sub("_HUMAN.H11MO.[0-1].[A-D]", "", motif))) +
theme(legend.position = "none",
plot.title = element_blank(),
axis.title.y = element_text(size = 8, face = "bold"),
axis.title.x = element_text(size = 8, face = "bold"),
strip.background = element_rect(fill = analysis.colors[i + 1]),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
# Remove panel background
panel.background = element_blank()
)
})
# If this is the first plot of the analysis we will add a labels (rectangle + text)
if(j == topTFs[1]){
gt <- gtable::gtable(widths = unit(0.5,"cm"), heights = unit(3, "in")) %>%
gtable::gtable_add_grob(grid::rectGrob(gp=gpar(fill= analysis.colors[i + 1])), 1, 1, name = 1) %>%
gtable::gtable_add_grob(grid::textGrob(colnames(tab.pval.esca)[i],
rot = 90,
gp = grid::gpar(fontsize = 8)), 1, 1, name = 2)
scatter <- scatter %>% ggpubr::annotate_figure(left = gt)
}
scatter.list[[paste0(i,j)]] <- scatter
}
scatter.grid[[i]] <- ggpubr::ggarrange(plotlist = scatter.list[grep(paste0("^",i),names(scatter.list))],
ncol = top,
# the first column should be wider since it has the label
widths = c(1.15,rep(1,top - 1)),
nrow = 1,
common.legend = T,
legend = "bottom")
}
}
options(repr.plot.width=20, repr.plot.height=8)
# we use ggarrange to arrange the plots together
ggpubr::ggarrange(plotlist = scatter.grid,
ncol = 1,
# the first column should be wider since it has the label
widths = c(1.15,rep(1,top - 1)),
nrow = 2, # Number of analysis - rows of the plot
heights = c(rep(1.5,length(scatter.list) - 1), 2),
common.legend = F,
legend = "bottom")library(ComplexHeatmap)
library(vegan)
col <- ifelse(classification == "family","top.potential.TF.family", "top.potential.TF.subfamily")
analysis.colors <- c(
"COAD-READ (vs Normal)" = '#ff964f',
"EAC (vs Normal)" = '#94e894',
"EAC (vs ESCC)" = '#eaadb9',
"PAAD (vs Normal)" = '#9ab9f9'
)
# get top TFs in each anaylsis
labels <- ELMER:::get.top.tf.by.pval(tab.pval,top = nrow(tab.pval))
hb = HeatmapAnnotation(df = data.frame("Mode" = rep("unsupervised",4),
"Analysis" = analysis.names),
col = list("Analysis" = analysis.colors,
"Mode" = c("unsupervised" = "#ebd540",
"supervised" = "#718da5")),
show_annotation_name = FALSE,
show_legend = T,
annotation_name_side = "left",
annotation_name_gp = gpar(fontsize = 12))
# Binary heatmap
ht_list <- Heatmap(as.matrix(tab.binary),
name = "Binary heatmap" ,
col = c("1" = "black", "0" = "white"),
column_names_gp = gpar(fontsize = 8),
show_column_names = FALSE,
use_raster = TRUE,
na_col = "white",
raster_device = c("png"),
raster_quality = 2,
show_row_names = F,
show_heatmap_legend = TRUE,
cluster_columns = FALSE,
cluster_rows = TRUE,
clustering_distance_rows = function(x) {
vegan::vegdist(tab.binary,method = "jaccard",binary = TRUE)
},
clustering_method_rows = "average",
top_annotation = hb,
width = unit(5, "cm"),
row_names_gp = gpar(fontsize = 1),
column_title = paste0("Binrary MR TF Heatmap"),
column_title_gp = gpar(fontsize = 10),
row_title_gp = gpar(fontsize = 16))
ht_list <- ht_list +
rowAnnotation(link = anno_mark(at = match(labels,rownames(tab.pval)),
labels = labels,labels_gp = gpar(fontsize = 10)),
width = unit(1, "cm") + max_text_width(labels)
)
draw(ht_list,
newpage = FALSE,
column_title_gp = gpar(fontsize = 12, fontface = "bold"),
heatmap_legend_side = "right",
show_heatmap_legend = TRUE,
annotation_legend_side = "right")# Pvalue heatmap
ht_list <- Heatmap(as.matrix(-log10(tab.pval)),
name = "-log10 (FDR)" ,
col = colorRampPalette(c('#3361A5', '#248AF3', '#14B3FF',
'#88CEEF', '#C1D5DC', '#EAD397',
'#FDB31A','#E42A2A', '#A31D1D'))(100),
column_names_gp = gpar(fontsize = 8),
show_column_names = FALSE,
use_raster = TRUE,
na_col = "white",
raster_device = c("png"),
raster_quality = 2,
show_row_names = F,
show_heatmap_legend = TRUE,
cluster_columns = FALSE,
cluster_rows = TRUE,
clustering_distance_rows = function(x) {
vegan::vegdist(tab.binary,method = "jaccard",binary = TRUE)
},
clustering_method_rows = "average",
top_annotation = hb,
width = unit(5, "cm"),
row_names_gp = gpar(fontsize = 1),
column_title = paste0("MR TF Heatmap"),
column_title_gp = gpar(fontsize = 10),
row_title_gp = gpar(fontsize = 16))
ht_list <- ht_list +
rowAnnotation(link = anno_mark(at = match(labels,rownames(tab.pval)),
labels = labels,labels_gp = gpar(fontsize = 10)),
width = unit(1, "cm") + max_text_width(labels)
)
draw(ht_list,
newpage = TRUE,
column_title_gp = gpar(fontsize = 12, fontface = "bold"),
heatmap_legend_side = "right",
show_heatmap_legend = TRUE,
annotation_legend_side = "right")MR TF expression vs DNA methylation at binding motifs
Since we only provide one object used for two analysis (“EAC (vs Normal)” and “EAC (vs ESCC)” ) we will only plot the MR TF expression vs DNA methylation at binding motifs for them.
analsysis.dir.esca <- unique(dirname(dir("analysis_april_2019/ESCA/",recursive = T,full.names = T)))
mae <- readRDS("TCGA_ESCA_MAE_distal_regions.rds")
tab.pval.esca <- tab.pval[,2:3]
library(ELMER)
library(grid)
library(ggplot2)
library(ggpubr)
library(dplyr)
for (top in c(5)) {
scatter.list <- list()
scatter.grid <- list()
scatter.labels <- c()
for (path in analsysis.dir.esca) {
# Load of object with enriched motifs
load(dir(path = path,
pattern = "getMotif.hypo.enriched.motifs.rda",
recursive = T,full.names = T)
)
i <- which(path == analsysis.dir.esca)
tumor <- basename(dirname(dirname(dirname(path))))
mae.plot <- mae
if (grepl("Squamous",path)) {
category <- "primary_diagnosis"
# keep only samples used in the analysis
mae.plot <- mae.plot[,mae.plot$primary_diagnosis %in% c("Adenocarcinoma, NOS",
"Squamous cell carcinoma, NOS") &
mae.plot$definition %in% c("Primary solid Tumor")]
} else {
category <- "definition"
# keep only samples used in the analysis
mae.plot <- mae.plot[,mae.plot$primary_diagnosis %in% c("Adenocarcinoma, NOS") &
mae.plot$definition %in% c("Primary solid Tumor","Solid Tissue Normal")]
}
# get top Tfs
topTFs <- rownames(tab.pval)[head(sort(DelayedArray::rowMins(tab.pval.esca[,i,drop = F],na.rm = T),
index.return = TRUE, decreasing = F)$ix, n = top)]
topTFs <- topTFs[1:top]
for (j in topTFs) {
TF <- j
scatter.labels <- c(scatter.labels,colnames(tab.pval.esca)[i])
motif <- tf.or.table[tf.or.table$TF == j &
tf.or.table$analysis == colnames(tab.pval.esca)[i],"motif"][1]
if (grepl("Squamous",path)) color.value <- c("#FF0000","#0000FF")
if (grepl("Adenocarcinoma.Solid.Tissue.Normal",path)) color.value <- c("#FF0000","#176d55")
if (is.na(motif)) next
scatter <- scatter.plot(data = mae.plot,
correlation = TRUE,
byTF = list(TF = TF,
probe = enriched.motif[[motif]]),
category = category,
ylim = c(0,25),
dots.size = 0.8,
color.value = color.value,
save = FALSE,
lm_line = TRUE) +
ylab(bquote(.(TF) ~ log[2](RSEM + 1))) +
xlab(paste0("Avg. DNA met. at \n", length(enriched.motif[[motif]]),
" CpGs w/ \n enriched motif ",
sub("_HUMAN.H11MO.[0-1].[A-D]", "", motif))
) + theme(legend.position = "none",
plot.title = element_blank(),
axis.title.y = element_text(size = 8, face = "bold"),
axis.title.x = element_text(size = 8, face = "bold"),
strip.background = element_rect(fill = analysis.colors[i + 1]),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
# Remove panel background
panel.background = element_blank()
)
if(j == topTFs[1]){
gt <- gtable::gtable(widths = unit(0.5,"cm"), heights = unit(3, "in")) %>%
gtable::gtable_add_grob(grid::rectGrob(gp=gpar(fill= analysis.colors[i + 1])), 1, 1, name = 1) %>%
gtable::gtable_add_grob(grid::textGrob(colnames(tab.pval.esca)[i],
rot = 90,
gp = grid::gpar(fontsize = 8)), 1, 1, name = 2)
scatter <- scatter %>% ggpubr::annotate_figure(left = gt)
}
scatter.list[[paste0(i,j)]] <- scatter
}
scatter.grid[[i]] <- ggpubr::ggarrange(plotlist = scatter.list[grep(paste0("^",i),names(scatter.list))],
ncol = top,
# the first column should be wider since it has the label
widths = c(1.15,rep(1,top - 1)),
nrow = 1,
common.legend = T,
legend = "bottom")
}
}
ggpubr::ggarrange(plotlist = scatter.grid,
ncol = 1,
# the first column should be wider since it has the label
widths = c(1.15,rep(1,top - 1)),
nrow = 2, # Number of analysis
heights = c(rep(1.5,length(scatter.list) - 1), 2),
#labels="AUTO", # This will add A B C D etc. to the plot
common.legend = F,
legend = "bottom")Code to create the data used in the workshop
The function getTCGA uses TCGAbiolinks to retrieve TCGA data in ELMER accept format.
genome <- "hg38"
# Get RNA-seq and DNA methylation from GDC using TCGAbiolinks
# Data will be saved in Data folder by default
getTCGA(disease = "ESCA", genome = genome, Meth = TRUE,RNA = TRUE)
# We will keep only the distal probes
distal.probes <- get.feature.probe(feature = NULL,
genome = genome,
met.platform = "450K")
mae <- createMAE(exp = paste0("Data/",tumor,"/",tumor,"_RNA_hg38_no_filter.rda"),
met = paste0("Data/",tumor,"/",tumor,"_Meth_hg38_no_filter.rda"),
met.platform = "450K",
genome = "hg38",
linearize.exp = TRUE,
filter.probes = distal.probes,
save = FALSE,
TCGA = TRUE)
mae$group <- paste0(mae$name,"-",mae$definition)
mae <- mae[,!mae$is_ffpe] # remove FFPE samples from analysis
saveRDS(object = mae, file = "TCGA_ESCA_MAE_distal_regions.rds")Session information
<<<<<<< HEAD ======= >>>>>>> 736b28b463b1fdd25bcd2659c86c33517475f6c4## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] dplyr_0.8.3 ggpubr_0.2.1
## [3] magrittr_1.5 ggplot2_3.2.0
## [5] vegan_2.5-5 lattice_0.20-38
## [7] permute_0.9-5 ComplexHeatmap_2.0.0
## [9] MultiAssayExperiment_1.10.4 SummarizedExperiment_1.14.0
## [11] DelayedArray_0.10.0 BiocParallel_1.18.0
## [13] matrixStats_0.54.0 Biobase_2.44.0
## [15] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0
## [17] IRanges_2.18.1 S4Vectors_0.22.0
## [19] BiocGenerics_0.30.0 ELMER_2.9.2
## [21] ELMER.data_2.9.3
##
## loaded via a namespace (and not attached):
<<<<<<< HEAD
## [1] utf8_1.1.4 R.utils_2.9.0
## [3] tidyselect_0.2.5 RSQLite_2.1.1
## [5] AnnotationDbi_1.46.0 htmlwidgets_1.3
## [7] DESeq_1.36.0 munsell_0.5.0
## [9] codetools_0.2-16 withr_2.1.2
## [11] colorspace_1.4-1 knitr_1.23
## [13] rstudioapi_0.10 ggsignif_0.5.0
## [15] labeling_0.3 GenomeInfoDbData_1.2.1
## [17] hwriter_1.3.2 KMsurv_0.1-5
## [19] bit64_0.9-7 downloader_0.4
## [21] vctrs_0.2.0 generics_0.0.2
## [23] xfun_0.8 biovizBase_1.32.0
## [25] ggthemes_4.2.0 BiocFileCache_1.8.0
## [27] EDASeq_2.18.0 R6_2.4.0
## [29] doParallel_1.0.14 clue_0.3-57
## [31] locfit_1.5-9.1 AnnotationFilter_1.8.0
## [33] bitops_1.0-6 reshape_0.8.8
## [35] assertthat_0.2.1 scales_1.0.0
## [37] nnet_7.3-12 gtable_0.3.0
## [39] sva_3.32.1 ensembldb_2.8.0
## [41] rlang_0.4.0 zeallot_0.1.0
## [43] genefilter_1.66.0 cmprsk_2.2-8
## [45] GlobalOptions_0.1.0 splines_3.6.0
## [47] rtracklayer_1.44.0 lazyeval_0.2.2
## [49] acepack_1.4.1 dichromat_2.0-0
## [51] selectr_0.4-1 broom_0.5.2
## [53] checkmate_1.9.4 yaml_2.2.0
## [55] GenomicFeatures_1.36.4 backports_1.1.4
## [57] Hmisc_4.2-0 tools_3.6.0
## [59] RColorBrewer_1.1-2 Rcpp_1.0.1
## [61] plyr_1.8.4 base64enc_0.1-3
## [63] progress_1.2.2 zlibbioc_1.30.0
## [65] purrr_0.3.2 RCurl_1.95-4.12
## [67] prettyunits_1.0.2 rpart_4.1-15
## [69] openssl_1.4.1 GetoptLong_0.1.7
## [71] cowplot_1.0.0 zoo_1.8-6
## [73] ggrepel_0.8.1 cluster_2.1.0
## [75] data.table_1.12.2 circlize_0.4.6
## [77] survminer_0.4.4 ProtGenerics_1.16.0
## [79] aroma.light_3.14.0 hms_0.5.0
## [81] evaluate_0.14 xtable_1.8-4
## [83] XML_3.98-1.20 gridExtra_2.3
## [85] shape_1.4.4 compiler_3.6.0
## [87] biomaRt_2.41.7 tibble_2.1.3
## [89] crayon_1.3.4 R.oo_1.22.0
## [91] htmltools_0.3.6 mgcv_1.8-28
## [93] Formula_1.2-3 tidyr_0.8.3
## [95] geneplotter_1.62.0 DBI_1.0.0
## [97] dbplyr_1.4.2 matlab_1.0.2
## [99] MASS_7.3-51.4 rappdirs_0.3.1
## [101] ShortRead_1.42.0 Matrix_1.2-17
## [103] readr_1.3.1 cli_1.1.0
## [105] R.methodsS3_1.7.1 Gviz_1.28.0
## [107] pkgconfig_2.0.2 km.ci_0.5-2
## [109] prettydoc_0.2.1 GenomicAlignments_1.20.1
## [111] foreign_0.8-71 plotly_4.9.0
## [113] xml2_1.2.0 foreach_1.4.4
## [115] annotate_1.62.0 XVector_0.24.0
## [117] rvest_0.3.4 stringr_1.4.0
## [119] VariantAnnotation_1.30.1 digest_0.6.20
## [121] ConsensusClusterPlus_1.48.0 Biostrings_2.52.0
## [123] rmarkdown_1.13 TCGAbiolinks_2.13.3
## [125] survMisc_0.5.5 htmlTable_1.13.1
## [127] edgeR_3.26.5 curl_3.3
## [129] Rsamtools_2.0.0 rjson_0.2.20
## [131] nlme_3.1-140 jsonlite_1.6
## [133] viridisLite_0.3.0 askpass_1.1
## [135] limma_3.40.2 BSgenome_1.52.0
## [137] fansi_0.4.0 pillar_1.4.2
## [139] httr_1.4.0 survival_2.44-1.1
## [141] glue_1.3.1 png_0.1-7
## [143] iterators_1.0.10 bit_1.1-14
## [145] stringi_1.4.3 blob_1.2.0
## [147] latticeExtra_0.6-28 memoise_1.1.0
Workshop materials
Workshops HTMLs
- ELMER data Workshop HTML: http://rpubs.com/tiagochst/elmer-data-workshop-2019
- ELMER analysis Workshop HTML: http://rpubs.com/tiagochst/ELMER_workshop
- ATAC-seq Workshop HTML: http://rpubs.com/tiagochst/atac_seq_workshop
Workshop videos
We have a set of recorded videos, explaining some of the workshops.
- All videos playlist: https://www.youtube.com/playlist?list=PLoDzAKMJh15kNpCSIxpSuZgksZbJNfmMt
- ELMER algorithm: https://youtu.be/PzC31K9vfu0
- ELMER data: https://youtu.be/R00wG--tGo8
- ELMER analysis part1 : https://youtu.be/bcd4uyxrZCw
- ELMER analysis part2: https://youtu.be/vcJ_DSCt4Mo
- ELMER summarizing several analysis: https://youtu.be/moLeik7JjLk
- ATAC-Seq workshop: https://youtu.be/3ftZecz0lU4